We have researched the conversion of network packets to bitmap format for analysis with a 2D-CNN model to categorize network flows as benign or malicious at or near the beginning of each flow. This data repository contains all of the data samples used for our 2D-CNN analyses using both one and two packets per flow. Each folder corresponds to a different analysis scenario. Each file corresponds to each individual run of an analysis. Further details about this research can be found within the paper associated with this repository:
https://doi.org/10.3390/network2040036

Each file within a folder is named as CNN_1-pkt* or CNN_2-pkt*, indicating whether that file's analysis included one or two packets per flow, respectively. Each run of analysis includes both a .pcap file and a .csv file, which share the same name for each analysis.

The .pcap file includes all of the raw, unsanitized packets included in that analysis. The .csv file contains the actual data given to the 2D-CNN model, including hex-encoded sanitized bitmaps containing one or two packets of data. Each entry in the .csv corresponds to one packet or pair of packets in the corresponding .pcap file.

For one-packet analysis, each .csv entry corresponds to each packet in the .pcap on a 1-to-1 basis in order. For two-packet analysis, each .csv entry corresponds to between one and two consecutive packets in order (if a 5-tuple only has one packet, there was no second packet for that flow in the original dataset, and the corresponding .csv entry will contain a zero-padded second packet instead).

The columns of the .csv are:
dataset, label, subset, image

The dataset column refers to which dataset that sample originated from.

The label column refers to the category label for that sample.

The subset column refers to whether the sample was used for training or testing (0 or 1, respectively)

The image column contains the hex-encoded bitmap image of that sample, which was input to the 2D-CNN model.

Authors: Garett Fox, Rajendra V. Boppana